Optimize Threshold (Operator Toolbox)
Synopsis
This operator evaluates different thresholds and delivers a model with the best threshold attached.Description
When solving a classification task one derives confidences for each class. By default RapidMiner assigns the prediction to the class with the highest confidence. In a binominal classification case this yields to the fact, that you assign the prediction to the class which has more than 0.5 confidence of being correct. This operator automatically changes this threshold to get the threshold which results in the highest accuracy.Input
- exa (Data Table)
The ExampleSet you want the thresholds to be optimized on.
- mod (Model)
The model for scoring. It is not mandatory to provide a model, but recommended to get a GroupedModel as result. If you do not have a model provided your provided ExampleSet needs to be scored already.
Output
- exa (Data Table)
The scored example set with the best thresholds applied.
- mod (Model)
If you provide a model as an input, you will receive a GroupedModel with the original Model and the Threshold Model combined. You can use this to score new data and apply the threshold at the same time. If you do not provide a model you will only get the threshold model. This threshold model can be applied using Apply Model.
- per (Performance Vector)
The performance of the best threshold.
Tutorial Processes
Optimizing the Threshold of a GBT in a X-Validation
In this process we train a GBT in a X-Validation to predict survival of the titanic desaster. We use the Optimize Threshold Operator to optimize the thresholds of survival inside x-validation. The result is, that only passengers with a confidence of surival bigger than 0.6 are predicted as Survived.